AITopics | stochastic linear bandit

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Neural Information Processing SystemsMar-20-2026, 07:42:27 GMT

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study \emph{sparse} regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

artificial intelligence, machine learning, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)

Add feedback

ca460332316d6da84b08b9bcf39b687b-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 04:11:42 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Neural Information Processing SystemsDec-24-2025, 22:06:25 GMT

This paper considers stochastic linear bandits with general nonlinear constraints. The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round $\tau\leq T$. We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields $\tilde{\cal O}\left(\left(\frac{K^{0.75}}{\delta}+d\right)\sqrt{\tau}\right)$ (pseudo) regret in round $\tau\leq T,$ where $K$ is the number of constraints, $d$ is the dimension of the reward feature space, and $\delta$ is a Slater's constant; and {\em zero} constraint violation in any round $\tau> \tau',$ where $\tau'$ is {\em independent} of horizon $T.$ Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization and includes two components. The primal component is similar to unconstrained stochastic linear bandits (our algorithm uses the linear upper confidence bound algorithm (LinUCB)). The computational complexity of the dual component depends on the number of constraints, but is independent of the sizes of the contextual space, the action space, and the feature space. Thus, the computational complexity of our algorithm is similar to LinUCB for unconstrained stochastic linear bandits.

efficient pessimistic-optimistic algorithm, name change, stochastic linear bandit, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.59)

Add feedback

df5f94d6ac6e13d830d70536cde9f0d2-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 12:32:06 GMT

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Neural Information Processing SystemsAug-17-2025, 09:03:52 GMT

This paper considers stochastic linear bandits with general nonlinear constraints.

artificial intelligence, constraint, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Neural Information Processing SystemsMay-27-2025, 00:17:19 GMT

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study \emph{sparse} regret bounds, that depend on the number S of non-zero coefficients in the linear reward function. Previous works focused on the case where S is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when S is unknown and the action sets are adversarially generated.

artificial intelligence, machine learning, sparsity-agnostic linear bandit, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.43)

Add feedback

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Neural Information Processing SystemsJan-19-2025, 04:29:37 GMT

This paper considers stochastic linear bandits with general nonlinear constraints. The objective is to maximize the expected cumulative reward over horizon T subject to a set of constraints in each round \tau\leq T . We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields \tilde{\cal O}\left(\left(\frac{K {0.75}}{\delta} d\right)\sqrt{\tau}\right) (pseudo) regret in round \tau\leq T, where K is the number of constraints, d is the dimension of the reward feature space, and \delta is a Slater's constant; and {\em zero} constraint violation in any round \tau \tau', where \tau' is {\em independent} of horizon T. Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization and includes two components.

artificial intelligence, constraint-based reasoning, efficient pessimistic-optimistic algorithm, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.62)

Add feedback

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Jin, Tianyuan, Jang, Kyoungseok, Cesa-Bianchi, Nicolò

arXiv.org Machine LearningJun-3-2024

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

algorithm, bandit, sparselinucb, (15 more...)

arXiv.org Machine Learning

2406.01192

Country:

Asia > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Neural Information Processing SystemsMar-13-2024, 01:13:37 GMT

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly improve on existing work, especially in the sparse case.

algorithm, allocation, bandit, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Linear bandits with polylogarithmic minimax regret

Lumbreras, Josep, Tomamichel, Marco

arXiv.org Machine LearningFeb-19-2024

We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.

bandit, eigenvalue, inequality, (15 more...)

arXiv.org Machine Learning

2402.12042

Country: